Efficient Knowledge Distillation from an Ensemble of Teachers

نویسندگان

Takashi Fukuda

Masayuki Suzuki

Gakuto Kurata

Samuel Thomas

Jia Cui

Bhuvana Ramabhadran

چکیده

This paper describes the effectiveness of knowledge distillation using teacher student training for building accurate and compact neural networks. We show that with knowledge distillation, information from multiple acoustic models like very deep VGG networks and Long Short-Term Memory (LSTM) models can be used to train standard convolutional neural network (CNN) acoustic models for a variety of systems requiring a quick turnaround. We examine two strategies to leverage multiple teacher labels for training student models. In the first technique, the weights of the student model are updated by switching teacher labels at the minibatch level. In the second method, student models are trained on multiple streams of information from various teacher distributions via data augmentation. We show that standard CNN acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to teacher VGG and LSTM acoustic models. Additionally we also investigate the effectiveness of using broadband teacher labels as privileged knowledge for training better narrowband acoustic models within this framework. We show the benefit of this simple technique by training narrowband student models with broadband teacher soft labels on the Aurora 4 task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Born Again Neural Networks

Knowledge distillation techniques seek to transfer knowledge acquired by a learned teacher model to a new student model. In prior work, the teacher typically is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness while suffering only minimal degradation in performance. In this paper,...

متن کامل

Qubit and Entanglement assisted Optimal Entanglement Concentration

We present two methods for optimal entanglement concentration from pure entangled states by local actions only. However a prior knowledge of the Schmidt coefficients is required. The first method is optimally efficient only when a finite ensemble of pure entangled states are available whereas the second method realizes the single pair optimal concentration probability. We also propose an entang...

متن کامل

On-line Learning of an Unlearnable True Teacher through Mobile Ensemble Teachers

On-line learning of a hierarchical learning model is studied by a method from statistical mechanics. In our model a student of a simple perceptron learns from not a true teacher directly, but ensemble teachers who learn from the true teacher with a perceptron learning rule. Since the true teacher and the ensemble teachers are expressed as non-monotonic perceptron and simple ones, respectively, ...

متن کامل

The Impact of Collegial Instruction on Peers’ Pedagogical Knowledge (PK): An EFL Case Study

Shared responsibilities such as mentoring, instruction, learner monitoring and classroom management enable the peers to observe, review, reflect on and learn from the overall practical professional expertise of one another through collegial instruction experience. The present exploratory case study has The present exploratory case study has attempted to study collegial teaching as an innovative...

متن کامل

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. In this work, we run experiments with different kinds of teacher networks to enhance the translation performance of a student Neural Machine Translation (NMT) network. We demonstrate techniques based on an ensemble and a best BLEU teacher network. We also show ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Efficient Knowledge Distillation from an Ensemble of Teachers

نویسندگان

چکیده

منابع مشابه

Born Again Neural Networks

Qubit and Entanglement assisted Optimal Entanglement Concentration

On-line Learning of an Unlearnable True Teacher through Mobile Ensemble Teachers

The Impact of Collegial Instruction on Peers’ Pedagogical Knowledge (PK): An EFL Case Study

Ensemble Distillation for Neural Machine Translation

عنوان ژورنال:

اشتراک گذاری